Implement the new tuning API for DeviceRleDispatch#7669
Implement the new tuning API for DeviceRleDispatch#7669bernhardmgruber merged 7 commits intoNVIDIA:mainfrom
DeviceRleDispatch#7669Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
There are substantial SASS diffs, for example in on SM75. |
9b20ea4 to
01f7ef3
Compare
With the change from #7733, I asked Cursor:
and it indeed found the root cause. I am impressed. |
This comment has been minimized.
This comment has been minimized.
| // TODO(bgruber): I think we want `LengthT` instead of `int` | ||
| return make_default_policy(BLOCK_LOAD_DIRECT, sizeof(int), LOAD_LDG); |
There was a problem hiding this comment.
Retaining comment on old code
This comment has been minimized.
This comment has been minimized.
5d53482 to
490d98b
Compare
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 21h 20m: Pass: 100%/249 | Total: 9d 05h | Max: 3h 57m | Hits: 71%/153868See results here. |
Depends on:
cub.bench.run_length_encode.non_trivial_runs.baseon SM75;80;86;90;100Fixes: #7532